vembedr::embed_youtube("oFmjHxl28H0", width = 600, height=300)1: Functions and vectors
BSTA 526: R Programming for Health Data Science
1 Before you get started
Please save a copy of this as part1_FIRSTNAME_LASTNAME.qmd and work from that (where FIRSTNAME is your first name, and LASTNAME is your last name). This way, you’ll have the original as a reference just in case.
Also, the first time you try something, try to type out the answer rather than copying and pasting. It will help you understand what’s going on, because it forces you to read the code. However, if you find yourself getting too in the weeds with typing during class, copying and pasting works too! Practice typing on your own.
2 Welcome to R Programming!
This course introduces you to R in two parts:
Part 1 focuses on working through common tasks in data science:
- importing data
- wrangling data
- visualizing data
- summarizing data
Part 2 focuses on more advanced topics:
- joining and merging data tables
- automating analyses (
purrr/for loops),
We might get to:
- running basic statistical procedures
- fancy tables in Quarto with
kable,gt, andgtsummary - advanced Quarto topics
Throughout, we’ll work on concepts of reproducibility by utilizing RStudio project and Quarto document-based workflows as a way of reproducibly sharing our work.
3 What is R?
- R is an open source statistical and programming computer language widely used for a variety of applications.
- Why “R”??
- Scheme inspired S (invented at Bell Labs in 1976) which inspired R (free and open source! in 1992)
4 Learning Objectives
By the end of this session, you should be able to:
- Work within the RStudio interface to run R code in a Quarto document
- Understand basic R syntax to use functions and assign values to objects
- Create and manipulate vectors and understand how R deals with missing data
- Install and load R packages
5 Introduction to R
5.1 R/RStudio
Link to video: https://youtu.be/oFmjHxl28H0
- Winter 2025:
- Start at minute 2. The first 2 minutes are on using RStudio server in the Cloud, which we are not using.
- This video provides an overview of what the difference parts of the RStudio interface are used for.
- Replace references to RMarkdown (or .Rmd) with Quarto (or .qmd).
A good reference built into RStudio is
Help -> Cheatsheets -> RStudio IDE cheat sheetA nice summary of the RStudio anatomy is here:
Here are some useful gifs about customizing the RStudio panels
5.2 RStudio projects
How do you eat an elephant?
One bite at a time. We will go over topics related to RStudio/file management again and again this class, so don’t worry if it is confusing at first.
- We will be using RStudio projects.
- We will talk about them again later, but for now,
- open RStudio by double clicking on the .Rproj file for part 1 (
part_01.Rproj)
You may read this for more info, and watch this short video on creating new projects:
Link to video: https://youtu.be/D22THnoPA6w
vembedr::embed_youtube("D22THnoPA6w", width = 600, height=300)5.3 Quarto (.qmd)
- See intro to Quarto from BSTA 511 Week 1
- Can view slides as html, pdf, or “continuous” webpage.
5.3.1 Create a Quarto file
- See BSTA 511/611 Day 1 section on Create a Quarto file.
5.3.2 Markdown for “word processing”
- The Quarto (qmd) document uses markdown language for text outside of code chunks.
- Some markdown resources:
- BSTA 511/611 Day 1 notes on markdown
- Official Quarto webpage on markdown basics
- From BSTA 504: Read more about markdown here.
- Skip the last two sections on Front matter and Citations, since these are different for Quarto.
5.4 Code chunks
Link to video: https://youtu.be/0iETdE7WkqU
vembedr::embed_youtube("0iETdE7WkqU", width = 600, height=300)The grey box below is a code chunk:
# basic math
4 + 5 [1] 9
- Everything that starts with a
#is called a comment and is not code that runs. It is useful for making notes for yourself. - Below the comment is the actual code.
- How do we run the code?
Try this one out. It’s the same code as above, but with no spaces. Does it still run?
# same code as above, without spaces
4+5[1] 9
5.5 Useful keyboard shortcuts (Tools → Keyboard Shortcuts Help)
| action | mac | windows/linux |
|---|---|---|
| Run code in qmd or script | cmd + enter | ctrl + enter |
| Add code chunk | cmd + option + i | ctrl + alt + i |
<- |
option + - | alt + - |
| interrupt currently running code | esc | esc |
| in console, go to previously run code | up/down | up/down |
%>% |
cmd + shift + m | ctrl + shift + m |
| search files | cmd + shift + f | ctrl + shift + f |
| render qmd | cmd + shift + k | ctrl + shift + k |
| run entire code chunk | cmd + option + c | ctrl + alt + c |
| keyboard shortcut help | option + shift + k | alt + shift + k |
6 Using functions
Link to video: https://youtu.be/aQPOhhLinZM
vembedr::embed_youtube("aQPOhhLinZM", width = 600, height=300)Below is an example of an R function:
# using a function: rounding numbers
round(3.14)[1] 3
pi[1] 3.141593
round(pi)[1] 3
R functions can have multiple arguments
# using a function with more arguments
round(x = 3.14, digits = 1)[1] 3.1
Do we have to “name” the arguments?
6.1 Getting Help
Learn more about the round() function with ?round:
?round- We can also type
?roundin the Console instead of including it in a code chunk.
# can switch order of arguments (if you name them)
round(digits = 1, x = 3.14)[1] 3.1
You may notice that boxes pop up as you type. These represent RStudio’s attempts to guess what you’re typing and share additional options.
There are many ways to get help. The more you learn how to get help, the easier your coding life will be. Here’s a list of options:
- Google “question + rcran” (i.e “hist rcran” or “make a boxplot ggplot”)
- Google error in quotes (i.e. “Evaluation error: invalid type (closure) for variable ‘***’”)
- Search RStudio community (now called Posit)
- Search Stack Overflow #r tag
- Search github for your function name to see examples or search the error
- Use generative AI (ChatGPT, Perplexity, etc.)
Post a question somewhere friendly:
- RStudio community
- to pick on my friend/coworker Emile, an example
- R for Data Science Online Learning Community - join the slack channel
- twitter #rstats
- or
#rstatson other social media platforms
- or
6.2 Challenge 1
- What does the function
histdo?- What are its main arguments?
- How did you determine this?
- Tricky bonus: what about
+, which is actually a function?
7 Common errors
7.1 “Object not found”
This happens when text is entered for a non-existent variable (object)
helloCan be due to missing quotes
install.packages(dplyr)or misspellings (R is case-sensitive)!
7.2 Incomplete commands
- In the console:
- When the console is waiting for a new command, the prompt line begins with >
- If the console prompt is +, then a previous command is incomplete
- You can finish typing the command in the console window
- If stressed and confused, press ESC many times (ESC = ESCAPE ME OUT OF HERE)
- When the console is waiting for a new command, the prompt line begins with >
- In a code chunk:
- R will let you know there is an error with a red circle containing a white X (see below).
- Note that all code chunks below this one will still have the red error circles until you fix the code.
- What happens if you try to run the code below?
- R will let you know there is an error with a red circle containing a white X (see below).
3 + (2*6Change #| eval: false above to #| eval: true after you fix the code error.
7.3 “could not find function”
- This can happen when you are calling a function but haven’t loaded the package that it “lives” in.
- For example, the function
day()being used below is from thelubridatepackage.- What error do we get when we run the code?
day("2025-01-09")How do we fix this code?
# either specify the package in front of ::function()
lubridate::day("2025-01-09")[1] 9
# or load the package first (preferably at beginning of script)
library(lubridate)
Attaching package: 'lubridate'
The following object is masked from 'package:vembedr':
hms
The following objects are masked from 'package:base':
date, intersect, setdiff, union
day("2025-01-09")[1] 9
Or, maybe there was a misspelling…
dsy("2025-01-09")8 Assigning objects with <-
Link to video: https://youtu.be/pW9wkwob1Es
vembedr::embed_youtube("pW9wkwob1Es", width = 600, height=300)<-is the primary assignment operator in R- Some naming conventions in R
- Objects cannot start with a number
- Object names are case sensitive
- No spaces in object names
# assigning value to an object
weight_kg <- 55- Now that the object has been assigned, we can reference that object by running its name:
# recall object
weight_kg[1] 55
- We can also use the object as a variable:
# multiple an object (convert kg to lb)
2.2 * weight_kg[1] 121
- We can create a new object (variable) based on the existing one:
# assign weight conversion to object
weight_lb <- 2.2 * weight_kg- Note that the code above only saves the value for
weight_lb, but it doesn’t show us what the value is. - To see what the value is, you can
- Check the Environment tab (this is not reproducible though)
- Add
()around the whole line of code to also see the value:
# added parentheses to see value of weight_lb in output
(weight_lb <- 2.2 * weight_kg)[1] 121
- Below we assign a new value to
weight_kg- Did this change the value of
weight_lb?
- Did this change the value of
# reassign new value to an object
weight_kg <- 100You can think of the names of objects like sticky notes. You have the option to place the sticky note (name) on any value you choose. You can pick up the sticky note and place it on another value, but you need to explicitly tell R when you want values assigned to certain objects.
8.1 Removing objects
- You can clear the entire environment using the button at the top of the Environment panel with a picture of a broom.
- This may seem extreme, but don’t worry! We can re-create all the work we’ve already done by running each line of code again.
- To remove an individual object, use the
remove()function:
# remove object
remove(weight_lb) 8.2 Challenge 2
What is the value of each item at each step? (Hint, you can see the value of an object by typing in the name of the object, such as with the mass line below.)
mass <- 47.5 # 1. mass?
mass[1] 47.5
width <- 122 # 2. width?
mass <- mass * 2.0 # 3. mass?
width <- width - 20 #4. width?
mass_index <- mass/width # 5. mass_index?Make your answers here:
9 Vectors
Link to video: https://youtu.be/0qLgfpvzBqI
vembedr::embed_youtube("0qLgfpvzBqI", width = 600, height=300)9.1 Creating vectors
cis for combine or concatenate
# assign vector
ages <- c(50, 55, 60, 65)
# recall vector
ages[1] 50 55 60 65
9.2 Learning things about vectors
# how many things are in the object?
length(ages)[1] 4
# what type of object?
class(ages)[1] "numeric"
# performing functions with vectors
mean(ages)[1] 57.5
range(ages)[1] 50 65
9.3 Character vectors
# vector of body parts
organs <- c("lung", "prostate", "breast")In the example above, each word within the vector is encased in quotation marks, indicating these are character data, rather than object names.
9.4 Challenge 3
Please answer the following questions about organs:
- How many values are in
organs? - What type of object is
organs?
Answers here:
10 Object (data) types and Vectors
- character: sometimes referred to as string data, tend to be surrounded by quotes
- numeric: real numbers (decimals), sometimes referred to as “double”
- integer: a subset of numeric in which numbers are stored as integers
- logical: Boolean data (
TRUEandFALSE) - dates: can save data as seconds, hours, days, months, years, or combinations thereof. Recommend lubridate package for this.
- complex: complex numbers with real and imaginary parts (e.g., 1 + 4i)
- raw: bytes of data (machine readable, but not human readable)
10.1 Challenge 4
- R tends to handle interpreting data types in the background of most operations.
- The following code is designed to cause some unexpected results in R.
- What is unusual about each of the following objects?
num_char <- c(1, 2, 3, "a")
num_logical <- c(1, 2, 3, TRUE)
char_logical <- c("a", "b", "c", TRUE)
tricky <- c(1, 2, 3, "4")
hola <- c("hi", "guten tag", hello)11 Manipulating vectors
Link to video: https://youtu.be/J0y8Dtvm7bQ
vembedr::embed_youtube("J0y8Dtvm7bQ", width = 600, height=300)11.1 Adding values to vectors
ages[1] 50 55 60 65
# add a value to end of vector
(ages <- c(ages, 90) )[1] 50 55 60 65 90
# add value at the beginning
(ages <- c(30, ages))[1] 30 50 55 60 65 90
11.2 Extracting (or excluding) values from vectors
# extracting second value
organs[2] [1] "prostate"
# excluding second value
organs[-2] [1] "lung" "breast"
# extracting first and third values
organs[c(1, 3)] [1] "lung" "breast"
12 Missing data
vembedr::embed_youtube("r8RFoTXDs_U")NAindicates a missing value in R.NAis not a character!!!
# create a vector with missing data
heights <- c(2, 4, 4, NA, 6)- What happens when we try to calculate the mean or max of a vector with missing data?
# calculate mean and max on vector with missing data
mean(heights)[1] NA
max(heights)[1] NA
- How do we fix this?
# add argument to remove NA
mean(heights, na.rm = TRUE)[1] 4
max(heights, na.rm = TRUE)[1] 6
- Or, can use
na.omit- be careful with this!!
# remove incomplete cases
na.omit(heights) [1] 2 4 4 6
attr(,"na.action")
[1] 4
attr(,"class")
[1] "omit"
mean(na.omit(heights))[1] 4
12.1 Challenge 5
Complete the following tasks after creating this vector (Note: there are multiple solutions):
- Remove NAs on
more_heights(assign it to the objectmore_heights_complete) - Calculate the
median()ofmore_heights_complete
# create vector
more_heights <- c(63, 69, 60, 65, NA, 68, 61, 70, 61, 59, 64, 69, 63, 63, NA, 72, 65, 64, 70, 63, 65)
# remove NAs
# calculate the median13 Vectorization
- Most of R’s functions are “vectorized”
- This means that the function will operate on all elements of a vector without needing to use other advanced programming tools such as for loops (more on that later).
We can see this when we try to add vectors together:
x <- 1:4
y <- 6:9
z <- x + y
z[1] 7 9 11 13
All mathematical and logical operators are vectorized functions:
z^2[1] 49 81 121 169
z + 1[1] 8 10 12 14
z == 9[1] FALSE TRUE FALSE FALSE
z > 9[1] FALSE FALSE TRUE TRUE
x / y[1] 0.1666667 0.2857143 0.3750000 0.4444444
But other common functions are as well:
z <- x / y
round(z, 2)[1] 0.17 0.29 0.38 0.44
z <- c("no", "nope", "maybe")
paste(z, "hi")[1] "no hi" "nope hi" "maybe hi"
stringr::str_replace(z, "o","7")[1] "n7" "n7pe" "maybe"
14 R packages
- Packages are add-ons that contain functions and/or data.
- Usually the functions in a package are related to a certain type of data task or analysis method.
- You only need to install packages once.
- You need to “load” the packages that you need for your code
- every time you start R AND
- you need to have the code to load them at the top of your qmd or R script.
14.1 Loading packages: library()orpacman::p_load()`
You can load packages with the
library()function orp_load()function in thepacmanpackage.The following code loads two packages, though the
tidyversepackage is actually a suite of many packages.- This code assumes you have already installed the packages!!!
library(tidyverse)── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ tibble 3.2.1
✔ ggplot2 3.5.1 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ lubridate::hms() masks vembedr::hms()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(janitor)
Attaching package: 'janitor'
The following objects are masked from 'package:stats':
chisq.test, fisher.test
# OR do this:
pacman::p_load(tidyverse, janitor) 15 Wrapping up
Today we covered
- R/RStudio and Quarto
- Functions
- Working with objects (vectors) and determining data types
- Missing data
- Vectorization
- R Packages
16 Post Class Survey
Please fill out the post-class survey. I will summarize muddiest and clearest points before each class. Your responses are anonymous in that I separate your names from the survey answers before compiling/reading.
17 Acknowledgements
- This Intro to R was copied from the BSTA 504 Winter 2023 course, taught by Jessica Minnier. I made minor modifications; primarily to update the material from RMarkdown to Quarto, and adding links to an introduction to Quarto from BSTA 511/611.
- Minnier’s Acknowledgements:
- This intro to R was adapted from material from Kate Hertweck’s Intro to R course from http://fredhutch.io, the Intro to R OCTRI BERD course I taught with Meike Niederhausen, and of course, Ted Laderas’ 2021 materials.